This is a document describing Kara’s initial exploration of the data from Packets 1-3.
These packets included 29 scales or subscales, including the following:
- Absorption (Tellegen); range: 0 to 34
- Absorption extra (a handful of extra questions extra questions); range: 0 to 4
- Daily spiritual experiences #1-4 (Underwood & Teresi); range: 0 to 20
- Daily spiritual experiences #5-14 (Underwood & Teresi); range: 0 to 50
- Daily spiritual experiences #5-14 (Thai) (additional questions asked only in Thailand); range: 0 to 60
- Daily spiritual experiences #15-16 (Underwood & Teresi); range: 0 to 10
- Daily spiritual experiences #15-16 (Thai) (additional questions asked only in Thailand); range: 0 to 30
- Spiritual events (Luhrmann); range: 0 to 88
- Sensory seeking (Brown et al.); range: -28 to 28
- Body awareness (Shields et al.); range: -36 to 36
- Attention to feelings (Salovey et al.); range: -42 to 42
- Hallucination (Alderson-Day); range: 0 to 27
- VISQ: dialogic speech (McCarthy-Jones & Fernyhough); range: -8 to 8
- VISQ: inner speech (McCarthy-Jones & Fernyhough); range: -10 to 10
- VISQ: evaluative/motivational speech (McCarthy-Jones & Fernyhough); range: -8 to 8
- Inner speech (Hardy/Bentall); range: -20 to 20
- Hearing events (Posey & Losch); range: 0 to 18
- Encoding style (Lewicki); range: 0 to 40
- Mind metaphors (Van Elk); range: -16 to 16
- Metacognition: lack of cognitive confidence (Wells et al.); range: -12 to 12
- Metacognition: positive beliefs re: worrying (Wells et al.); range: -12 to 12
- Metacognition: cognitive self-consciousness (Wells et al.); range: -12 to 12
- Metacognition: uncontrollability/danger (Wells et al.); range: -12 to 12
- Metacognition: need to control thoughts (Wells et al.); range: -12 to 12
- Dualism: mental states (Weisman); range: 0 to 8
- Dualism: life events (Weisman); range: 0 to 5
- Dualism: inanimate consciousness (Weisman); range: 0 to 6
- Dualism: minds, selves, & world (Weisman); range: 0 to 9
- Dualism: epistemology (Weisman); range: 0 to 5
Means by site
First, let’s compare the mean responses to each subscale across the 5 sites. I’ll plot each subscale in a separate mini-plot (denoted in the title of each mini-plot). The x-axis and color denote the site (US, Ghana, Thailand, China, or Vanuatu). Note the the range y-axis varies between mini-plots. The dot shows what the mean response was for that subscale for that site, and the error bars show a 95% confidence interval for that mean.
Joining, by = c("ctry", "packet", "subscale")

Cluster analysis
… of sites
Now let’s try a formal analysis for determining how similar these 5 sites are to each other.
Hierarchical clustering basically works like this: If there are 5 things - A, B, C, D, and E - this analysis will try to group pairs of like things together. E.g., first it might pair A and D together, then it will pretend that there are just 4 things - AD, B, C, and E. Then it might pair B and C together, and then pretend that there are just 3 things - AD, BC, and E. Then it might pair AD and BC together, and then pretend that there are just 2 things: ADBC and E. It will do this until there is just one “thing” left.
So to read this plot, look for the pairings. Sites that are on the same “branch” in this “dendrogram” showed similar patterns of means on the subscales. The closer together they are on that branch, the more similar they were.

… of subscales
Now let’s do the same thing for the subscales: Which subscales “hang together”?
To read this plot, again, look for the pairings. Subscales that are on the same “branch” in this “dendrogram” showed similar patterns of means across the 5 sites. The closer together they are on that branch, the more similar they were.

Look at correlations among subscales by site
Now let’s take a closer look at which subscales seem to “hang together,” by looking at the correlations between scales in their means for each site.
To read this plot, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue).
I ordered the scales according to the cluster analysis we just did above, so you can see “patches” of scales that all tended to have similar patterns of means across sites.

Look at correlations among subscales by individuals
We could do the same thing thinking about individual participants instead of sites - but it’s important to keep in mind that most people didn’t fill out all 29 subscales! So we’ll focus on just looking at how the subscales within each packet (Packet 1, 2, or 3) “hang together” for the people who completed that packet.
As above, to read these plots, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue). (Note that these scales are being presented in alphabetical order, not by any sort of clustering analysis.)
Packet 1
Setting row names on a tibble is deprecated.

Packet 2
Setting row names on a tibble is deprecated.

Packet 3
Setting row names on a tibble is deprecated.

All packets
Just for fun, here’s a look at the correlations among individual participants’ subscale scores across all scales. Note that some of these pairs of subscales probably have very few observations going into these correlations!! So take this with a grain of salt.
Setting row names on a tibble is deprecated.

Focusing on absorption & spiritual events
Let’s look at relationships between absorption & spiritual events across individual participants, but separating out by site. Tanya’s strong prediction would be that higher absorption (as captured by the Tellegen scale, exwl) would be correlated with more spiritual events (as indxed by the Luhrmann scale, spev).
Here’s a test for the overall correlation:
Pearson's product-moment correlation
data: exwl and spev
t = 8.3864, df = 485, p-value = 5.495e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2757416 0.4310991
sample estimates:
cor
0.3558766
Looks like this prediction is born out by the data! Absorption was significantly positively correlated with reported spiritual events (\(r\) = 0.36, \(p\) = <0.001), accounting for approximately 13% of the variance in reported spirtual events.
Now let’s take a look at what’s going on in each site with a regression analysis:
| baserate (overall) |
17.54 |
0.53 |
33.1 |
<0.001 |
| baserate (ghana vs. overall) |
16.05 |
1.10 |
14.6 |
<0.001 |
| baserate (thailand vs. overall) |
-6.17 |
0.98 |
-6.3 |
<0.001 |
| baserate (china vs. overall) |
-10.79 |
1.01 |
-10.7 |
<0.001 |
| baserate (vanuatu vs. overall) |
4.88 |
1.19 |
4.1 |
<0.001 |
| absorption (overall) |
0.96 |
0.09 |
11.1 |
<0.001 |
| absorption (ghana vs. overall) |
0.47 |
0.16 |
3.0 |
0.003 |
| absorption (thailand vs. overall) |
-0.25 |
0.18 |
-1.4 |
0.154 |
| absorption (china vs. overall) |
-0.36 |
0.18 |
-2.0 |
0.044 |
| absorption (vanuatu vs. overall) |
0.23 |
0.19 |
1.2 |
0.230 |
These results indicate that, as predicted, there was a significant positive relationship between absorption and reported spiritual events, collapsing across sites (\(b\) = 0.96, \(p\) = <0.001). The relationship between absorption and spiritual events was particularly strong among participants in Ghana (\(b\) = 0.47, \(p\) = 0.003), and (perhaps) weaker than average among participants in China (\(b\) = -0.36, \(p\) = 0.044).
Note also that there are overall differences in the baserates for reporting spiritual events: Participants in Ghana and Vanuatu generally reported more than average (Ghana: \(b\) = 16.05, \(p\) = <0.001; Vanuatu: \(b\) = 4.88, \(p\) = <0.001), while participants in Thailand and China generally reported fewer than average (Thailand: \(b\) = -6.17, \(p\) = <0.001; China: \(b\) = -10.79, \(p\) = <0.001).
Here’s a visualization of these relationships:

Dualism
Here’s just some initial attempt to take a look at how the dualism questions hang together… more on this later.
Joining, by = "question"
Column `question` joining character vector and factor, coercing into character vector

---
title: 'SC data entry: Packets 1-3'
subtitle: 'KW initial exploration (updated with data from 2018-01-31)'
output:
  html_notebook: default
  html_document:
    df_print: paged
  pdf_document: default
---

```{r, include = FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
```

```{r, include = FALSE}
# set working director
# setwd("/Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/")

# load packages
library(tidyverse)
library(rms)
library(ggdendro)

# load question key (including manual reverse-coding)
question_key <- read.csv("//Users/kweisman/Documents/Research (Stanford)/Projects/Templeton Grant/DATA WRANGLING/templeton_packets/packets123/packets123_question_key_byhand.csv")

# load data
d_wide <- read_csv("./packets123_data_byquestion_wide.csv") %>%
  mutate(ctry = factor(ctry, 
                       levels = c("us", "ghana", "thailand", "china", "vanuatu")))
d_wide_subscale <- read_csv("./packets123_data_bysubscale_wide.csv") %>%
  mutate(ctry = factor(ctry, 
                       levels = c("us", "ghana", "thailand", "china", "vanuatu")))
d_long <- read_csv("./packets123_data_byquestion_long.csv") %>%
  mutate(ctry = factor(ctry, 
                       levels = c("us", "ghana", "thailand", "china", "vanuatu")))
d_long_subscale <- read_csv("./packets123_data_bysubscale_long.csv") %>%
  mutate(ctry = factor(ctry, 
                       levels = c("us", "ghana", "thailand", "china", "vanuatu")))

# make custom functions
round2 <- function(x) {format(round(x, 2), digits = 2)}
```

This is a document describing Kara's initial exploration of the data from Packets 1-3. 

These packets included `r length(levels(factor(d_long_subscale$subscale))) - 1` scales or subscales, including the following:

- **Absorption** (Tellegen); range: 0 to 34
- **Absorption extra** (a handful of extra questions extra questions); range: 0 to 4
- **Daily spiritual experiences #1-4** (Underwood & Teresi); range: 0 to 20
- **Daily spiritual experiences #5-14** (Underwood & Teresi); range: 0 to 50
- **Daily spiritual experiences #5-14 (Thai)** (additional questions asked only in Thailand); range: 0 to 60
- **Daily spiritual experiences #15-16** (Underwood & Teresi); range: 0 to 10
- **Daily spiritual experiences #15-16 (Thai)** (additional questions asked only in Thailand); range: 0 to 30
- **Spiritual events** (Luhrmann); range: 0 to 88
- **Sensory seeking** (Brown et al.); range: -28 to 28
- **Body awareness** (Shields et al.); range: -36 to 36
- **Attention to feelings** (Salovey et al.); range: -42 to 42
- **Hallucination** (Alderson-Day); range: 0 to 27
- **VISQ: dialogic speech** (McCarthy-Jones & Fernyhough); range: -8 to 8
- **VISQ: inner speech** (McCarthy-Jones & Fernyhough); range: -10 to 10
- **VISQ: evaluative/motivational speech** (McCarthy-Jones & Fernyhough); range: -8 to 8
- **Inner speech** (Hardy/Bentall); range: -20 to 20
- **Hearing events** (Posey & Losch); range: 0 to 18
- **Encoding style** (Lewicki); range: 0 to 40
- **Mind metaphors** (Van Elk); range: -16 to 16
- **Metacognition: lack of cognitive confidence** (Wells et al.); range: -12 to 12
- **Metacognition: positive beliefs re: worrying** (Wells et al.); range: -12 to 12
- **Metacognition: cognitive self-consciousness** (Wells et al.); range: -12 to 12
- **Metacognition: uncontrollability/danger** (Wells et al.); range: -12 to 12
- **Metacognition: need to control thoughts** (Wells et al.); range: -12 to 12
- **Dualism: mental states** (Weisman); range: 0 to 8
- **Dualism: life events** (Weisman); range: 0 to 5
- **Dualism: inanimate consciousness** (Weisman); range: 0 to 6
- **Dualism: minds, selves, & world** (Weisman); range: 0 to 9
- **Dualism: epistemology** (Weisman); range: 0 to 5

# Means by site

First, let's compare the mean responses to each subscale across the 5 sites. I'll plot each subscale in a separate mini-plot (denoted in the title of each mini-plot). The x-axis and color denote the site (US, Ghana, Thailand, China, or Vanuatu). Note the the range y-axis varies between mini-plots. The dot shows what the mean response was for that subscale for that site, and the error bars show a 95% confidence interval for that mean. 

```{r}
d_long_subscale_boot <- d_long_subscale %>%
  filter(!is.na(sum_score)) %>%
  group_by(ctry, packet, subscale) %>%
  do(data.frame(rbind(smean.cl.boot(.$sum_score)))) %>%
  ungroup() %>%
  filter(subscale != "attn") %>%
  left_join(d_long_subscale %>%
              filter(!is.na(sum_score)) %>%
              count(ctry, packet, subscale)) %>%
  mutate(packet = paste("packet", packet),
         ctry = factor(ctry,
                       levels = c("us", "ghana", 
                                  "thailand", "china", "vanuatu")),
         subscale = 
           factor(subscale,
                  levels = c("exwl", 
                             "exwl_extra",
                             "dse_1to4", 
                             "dse_5to14", 
                             "dse_5to14_thai",
                             "dse_15to16", 
                             "dse_15to16_thai",
                             "spev", 
                             "sen_sensory_seeking", 
                             "sen_body_awareness", 
                             "sen_trait_metamood",
                             "her2_hallucination",
                             "invo_VISQ_dialogic_speech", 
                             "invo_VISQ_inner_speech",
                             "invo_VISQ_eval_motiv_inner_speech",
                             "invo_hardy_bentall",
                             "her_posey_losch",
                             "enco_lewicki",
                             "meta_van_elk",
                             "tat_confidence",
                             "tat_positive_beliefs",
                             "tat_cognitive",
                             "tat_uncontrollability",
                             "tat_need_control",
                             "minw_mental_states",
                             "minw_life_events",
                             "minw_inanimate",
                             "minw_selves_souls_world",
                             "minw_epistemic"),
                  labels = c("absorption (tellegen)", 
                             "absorption (extra)",
                             "daily spiritual experiences (#1-4)",
                             "daily spiritual experiences (#5-14)",
                             "daily spiritual experiences (#5-14 thai)",
                             "daily spiritual experiences (#15-16)",
                             "daily spiritual experiences (#15-16 thai)",
                             "spiritual events", 
                             "sensory seeking",
                             "body awareness", 
                             "attention to feelings",
                             "hallucination", 
                             "VISQ: dialogic speech",
                             "VISQ: inner speech",
                             "VISQ: evaluative",
                             "inner speech",
                             "hearing events",
                             "encoding style",
                             "mind metaphors",
                             "metacog.: lack of cognitive confidence",
                             "metacog.: positive beliefs re: worrying",
                             "metacog.: cognitive self-consciousness",
                             "metacog.: uncontrollability/danger",
                             "metacog.: need to control thoughts",
                             "dualism: mental states",
                             "dualism: life events",
                             "dualism: inanimate consciousness",
                             "dualism: minds, selves, & world",
                             "dualism: epistemology")))
```

```{r, fig.width = 5, fig.asp = 4}
ggplot(d_long_subscale_boot %>%
         mutate(subscale = 
                  factor(subscale,
                         labels = 
                           c("absorption\n(tellegen)\nrange: 0 to 34",
                             "absorption\n(extra)\nrange: 0 to 4",
                             "daily spiritual experiences\n(underwood & teresi; #1-4)\nrange: 0 to 20",
                             "daily spiritual experiences\n(underwood & teresi; #5-14)\nrange: 0 to 50",
                             "daily spiritual experiences\n(add'l thai versions for #5-14)\nrange: 0 to 60",
                             "daily spiritual experiences\n(underwood & teresi; #15-16)\nrange: 0 to 10",
                             "daily spiritual experiences\n(add'l thai versions for #15-16)\nrange: 0 to 30",
                             "spiritual events\n(luhrmann)\nrange: 0 to 88",
                             "sensory seeking\n(brown et al.)\nrange: -28 to 28",
                             "body awareness\n(shields et al.)\nrange: -36 to 36",
                             "attention to feelings\n(salovey et al.)\nrange: -42 to 42",
                             "hallucination\n(alderson-day)\nrange: 0 to 27",
                             "VISQ: dialogic speech\n(mccarthy-jones & fernyhough)\nrange: -8 to 8",
                             "VISQ: inner speech\n(mccarthy-jones & fernyhough)\nrange: -10 to 10",
                             "VISQ: evaluative/motivational speech\n(mccarthy-jones & fernyhough)\nrange: -8 to 8",
                             "inner speech\n(hardy/bentall)\nrange: -20 to 20",
                             "hearing events\n(posey & losch)\nrange: 0 to 18",
                             "encoding style\n(lewicki)\nrange: 0 to 40",
                             "mind metaphors\n(van elk)\nrange: -16 to 16",
                             "metacog.: lack of cognitive confidence\n(wells et al.)\nrange: -12 to 12",
                             "metacog.: positive beliefs re: worrying\n(wells et al.)\nrange: -12 to 12",
                             "metacog.: cognitive self-consciousness\n(wells et al.)\nrange: -12 to 12",
                             "metacog.: uncontrollability/danger\n(wells et al.)\nrange: -12 to 12",
                             "metacog.: need to control thoughts\n(wells et al.)\nrange: -12 to 12",
                             "dualism: mental states\n(weisman)\nrange: 0 to 8",
                             "dualism: life events\n(weisman)\nrange: 0 to 5",
                             "dualism: inanimate consciousness\n(weisman)\nrange: 0 to 6",
                             "dualism: minds, selves, & world\n(weisman)\nrange: 0 to 9",
                             "dualism: epistemology\n(weisman)\nrange: 0 to 5"))) %>%
         # filter(!grepl("thai", subscale)) %>% # get rid of thai-only scales
         mutate(packet = gsub("packet ", "P", packet)),
       aes(x = ctry, y = Mean, color = ctry)) +
  facet_wrap(~ reorder(interaction(packet, subscale, sep = ": "),
                       as.numeric(factor(packet))),
             ncol = 3, scales = "free") +
  geom_pointrange(aes(ymin = Lower, ymax = Upper)) +
  geom_text(aes(label = paste0("(n=", n, ")"), y = Lower), 
            size = 2, nudge_x = 0.15, hjust = 0) +
  scale_x_discrete(expand = c(0, 1)) +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        legend.position = "top") +
  labs(title = "mean subscale scores by site",
       subtitle = "ordered by packet (P1, P2, P3)\nerror bars are bootstrapped 95% confidence intervals",
       x = "site", color = "site",
       y = "mean subscale score (range varies by subscale)")
```

# Cluster analysis

```{r}
cor_by_site <- d_long_subscale_boot %>%
  filter(!grepl("thai", subscale)) %>% 
  mutate(subscale = ifelse(subscale == "inner speech",
                           paste(as.character(subscale),
                                 gsub("packet ", "p", packet),
                                 sep = "_"),
                           as.character(subscale))) %>%
  distinct(ctry, subscale, Mean) %>%
  filter(subscale != "inner voice") %>%
  ungroup() %>%
  spread(subscale, Mean) %>%
  data.frame() %>%
  column_to_rownames("ctry")
```

## ... of sites

Now let's try a formal analysis for determining how similar these 5 sites are to each other. 

Hierarchical clustering basically works like this: If there are 5 things - A, B, C, D, and E - this analysis will try to group pairs of like things together. E.g., first it might pair A and D together, then it will pretend that there are just 4 things - AD, B, C, and E. Then it might pair B and C together, and then pretend that there are just 3 things - AD, BC, and E. Then it might pair AD and BC together, and then pretend that there are just 2 things: ADBC and E. It will do this until there is just one "thing" left.

So to read this plot, look for the pairings. Sites that are on the same "branch" in this "dendrogram" showed similar patterns of means on the subscales. The closer together they are on that branch, the more similar they were.

```{r, fig.width = 2, fig.asp = 0.6}
clust_sites <- hclust(dist(cor_by_site))
ggdendrogram(clust_sites) +
  labs(title = "hierarchical clustering of sites",
       subtitle = "using mean subscale scores by site")
```

## ... of subscales

Now let's do the same thing for the subscales: Which subscales "hang together"? 

To read this plot, again, look for the pairings. Subscales that are on the same "branch" in this "dendrogram" showed similar patterns of means across the 5 sites. The closer together they are on that branch, the more similar they were.

```{r, fig.width = 4, fig.asp = 0.6}
clust_subscales <- hclust(dist(t(cor_by_site)))
ggdendrogram(clust_subscales) +
  labs(title = "hierarchical cluster of subscales",
       subtitle = "using mean subscale scores by site")
```

# Look at correlations among subscales by site

Now let's take a closer look at which subscales seem to "hang together," by looking at the correlations between scales in their means for each site. 

To read this plot, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue).

I ordered the scales according to the cluster analysis we just did above, so you can see "patches" of scales that all tended to have similar patterns of means across sites.

```{r, fig.width = 7, fig.asp = 1}
d_long_subscale_boot %>%
  filter(!grepl("thai", subscale)) %>% 
  mutate(subscale = ifelse(subscale == "inner speech",
                           paste(as.character(subscale),
                                 gsub("packet ", "p", packet),
                                 sep = "_"),
                           as.character(subscale))) %>%
  distinct(ctry, subscale, Mean) %>%
  ungroup() %>%
  spread(subscale, Mean) %>%
  data.frame() %>%
  column_to_rownames("ctry") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\ncorrelations between subscale means (by country), ordered by hierarchical clustering")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  mutate(subscaleA_order = as.numeric(factor(as.numeric(factor(subscaleA)),
                                             levels = clust_subscales$order))) %>%
  gather(subscaleB, cor, -subscaleA, -subscaleA_order) %>%
  mutate(subscaleB_order = as.numeric(factor(as.numeric(factor(subscaleB)),
                                             levels = clust_subscales$order))) %>%
  ggplot(aes(x = reorder(subscaleA, desc(subscaleA_order)), 
             y = reorder(subscaleB, desc(subscaleB_order)), 
             fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 50),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "correlations among mean subscale scores, by site",
       subtitle = "using pairwise complete observations\nordered via hierarchical clustering analysis",
       fill = "correlation\ncoeff. (r)",
       x = "",
       y = "")
```

# Look at correlations among subscales by individuals

We could do the same thing thinking about individual participants instead of sites - but it's important to keep in mind that most people didn't fill out all `r length(levels(factor(d_long_subscale$subscale))) - 1` subscales! So we'll focus on just looking at how the subscales within each packet (Packet 1, 2, or 3) "hang together" for the people who completed that packet.

As above, to read these plots, find one scale on the x-axis, and another scale on the y-axis. The value (and color) at that point in the grid shows you the correlation between these two subscale means across the 5 sites. The maximum possible correlation is +1 (red), and the minimum is -1 (blue). (Note that these scales are being presented in alphabetical order, *not* by any sort of clustering analysis.)

## Packet 1

```{r fig.width = 3, fig.asp = 1}
d_long_subscale %>%
  filter(packet == 1, !is.na(sum_score)) %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\nPACKET 1: correlations among subscales")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKET 1: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

## Packet 2

```{r fig.width = 3, fig.asp = 1}
d_long_subscale %>%
  filter(packet == 2, !is.na(sum_score)) %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\nPACKET 2: correlations among subscales")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKET 2: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

## Packet 3

```{r fig.width = 4, fig.asp = 1}
d_long_subscale %>%
  filter(packet == 3, !is.na(sum_score)) %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  # corrplot version:
  # corrplot::corrplot(method = "color", tl.col = "black",
  #                    addCoef.col = "black", order = "hclust",
  #                    col = RColorBrewer::brewer.pal(n = 11, name = "PRGn"),
  #                    title = "\nPACKET 3: correlations among subscales")

  # ggplot version:
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKET 3: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

## All packets

Just for fun, here's a look at the correlations among individual participants' subscale scores across all scales. **Note that some of these pairs of subscales probably have very few observations going into these correlations!! So take this with a grain of salt.**

```{r, fig.width = 6, fig.asp = 1}
d_long_subscale %>%
  spread(subscale, sum_score) %>%
  select(-c(ctry, wher, recr, whoc, attn)) %>%
  mutate(subj = paste(subj, packet, version, sep = "_")) %>%
  select(-packet, -version) %>%
  distinct() %>%
  remove_rownames() %>%
  column_to_rownames("subj") %>%
  cor(use = "pairwise.complete.obs") %>%
  data.frame() %>%
  rownames_to_column("subscaleA") %>%
  gather(subscaleB, cor, -subscaleA) %>%
  ggplot(aes(x = subscaleA, y = subscaleB, fill = cor, label = round2(cor))) +
  geom_tile() +
  geom_text(size = 3) +
  scale_fill_distiller(guide = guide_colorbar(barheight = 6),
                       palette = "RdYlBu", limits = c(-1, 1)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "PACKETS 1-3: correlations among subscales",
       subtitle = "using pairwise complete observations",
       x = "",
       y = "")
```

# Focusing on absorption & spiritual events

Let's look at relationships between absorption & spiritual events across individual participants, but separating out by site. Tanya's strong prediction would be that higher absorption (as captured by the Tellegen scale, `exwl`) would be correlated with more spiritual events (as indxed by the Luhrmann scale, `spev`).

Here's a test for the overall correlation:

```{r}
cor1 <- with(d_wide_subscale, cor.test(exwl, spev)); cor1
```

Looks like this prediction is born out by the data! Absorption was significantly positively correlated with reported spiritual events ($r$ = `r round2(cor1$estimate)`, $p$ = `r ifelse(cor1$p.value < 0.001, "<0.001", format(round(cor1$p.value, 3), digits = 3))`), accounting for approximately `r round(cor1$estimate^2 * 100)`% of the variance in reported spirtual events.

Now let's take a look at what's going on in each site with a regression analysis:

```{r, results = "asis"}
# effect-code site
contrasts(d_wide_subscale$ctry) <- cbind(ghana = c(-1, 1, 0, 0, 0),
                                         thailand = c(-1, 0, 1, 0, 0),
                                         china = c(-1, 0, 0, 1, 0),
                                         vanuatu = c(-1, 0, 0, 0, 1))

r1 <- lm(spev ~ exwl_cent * ctry, 
         d_wide_subscale %>% mutate(exwl_cent = scale(exwl, scale = F)))

r1_coef <- summary(r1)$coefficients %>%
  data.frame() %>%
  rownames_to_column("param") %>%
  rename(b = Estimate, se = Std..Error, t = t.value, p = Pr...t..) %>%
  mutate_at(vars(b, se, t), funs(round2)) %>%
  mutate(p = ifelse(p < 0.001, "<0.001",
                    format(round(p, 3), digits = 3)),
         param = factor(param,
                        levels = c("(Intercept)", 
                                   "ctryghana", 
                                   "ctrythailand",
                                   "ctrychina", 
                                   "ctryvanuatu", 
                                   "exwl_cent",
                                   "exwl_cent:ctryghana", 
                                   "exwl_cent:ctrythailand", 
                                   "exwl_cent:ctrychina", 
                                   "exwl_cent:ctryvanuatu"),
                        labels = c("baserate (overall)", 
                                   "baserate (ghana vs. overall)",
                                   "baserate (thailand vs. overall)",
                                   "baserate (china vs. overall)",
                                   "baserate (vanuatu vs. overall)",
                                   "absorption (overall)",
                                   "absorption (ghana vs. overall)",
                                   "absorption (thailand vs. overall)",
                                   "absorption (china vs. overall)",
                                   "absorption (vanuatu vs. overall)"))) %>%
  arrange(param) %>%
  column_to_rownames("param")

knitr::kable(r1_coef)
```

These results indicate that, as predicted, there was a significant positive relationship between absorption and reported spiritual events, collapsing across sites ($b$ = `r r1_coef["absorption (overall)", "b"]`, $p$ = `r r1_coef["absorption (overall)", "p"]`). The relationship between absorption and spiritual events was particularly strong among participants in Ghana ($b$ = `r r1_coef["absorption (ghana vs. overall)", "b"]`, $p$ = `r r1_coef["absorption (ghana vs. overall)", "p"]`), and (perhaps) weaker than average among participants in China ($b$ = `r r1_coef["absorption (china vs. overall)", "b"]`, $p$ = `r r1_coef["absorption (china vs. overall)", "p"]`).

Note also that there are overall differences in the baserates for reporting spiritual events: Participants in Ghana and Vanuatu generally reported more than average (Ghana: $b$ = `r r1_coef["baserate (ghana vs. overall)", "b"]`, $p$ = `r r1_coef["baserate (ghana vs. overall)", "p"]`; Vanuatu: $b$ = `r r1_coef["baserate (vanuatu vs. overall)", "b"]`, $p$ = `r r1_coef["baserate (vanuatu vs. overall)", "p"]`), while participants in Thailand and China generally reported fewer than average (Thailand: $b$ = `r r1_coef["baserate (thailand vs. overall)", "b"]`, $p$ = `r r1_coef["baserate (thailand vs. overall)", "p"]`; China: $b$ = `r r1_coef["baserate (china vs. overall)", "b"]`, $p$ = `r r1_coef["baserate (china vs. overall)", "p"]`).

Here's a visualization of these relationships:

```{r, fig.width = 4, fig.asp = 0.375}
ggplot(d_wide_subscale %>%
         filter(!is.na(exwl), !is.na(spev)) %>%
         mutate(ctry = factor(ctry, 
                              levels = c("us", "ghana", "thailand", 
                                         "china", "vanuatu")),
                exwl_cent = scale(exwl, scale = F)),
       aes(x = exwl, # x = exwl_cent, 
           y = spev, color = ctry)) +
  facet_grid(~ ctry) +
  geom_point(size = 1, alpha = 0.5) +
  geom_smooth(method = "lm", color = "black") +
  # geom_line(data = fortify(r1), aes(x = exwl_cent, y = .fitted), color = "black") +
  scale_color_brewer(palette = "Dark2") +
  xlim(0, 34) +
  ylim(0, 88) +
  theme_minimal() +
  theme(legend.position = "none") +
  labs(title = "relationship between absorption & spiritual events, by site",
       color = "site",
       x = "absorption (tellegen)",
       y = "spiritual events (luhrmann)")
```

# Dualism

Here's just some initial attempt to take a look at how the dualism questions hang together... more on this later.

```{r, fig.width = 5, fig.asp = 1}
library(ggdendro)
temp <- d_long %>%
  filter(grepl("minw_", question), !is.na(response)) %>%
  select(subj, question, response) %>%
  left_join(question_key %>% 
              select(question_label_universal, 
                     question_text, byhand_subscale) %>% 
              rename(question = question_label_universal)) %>%
  distinct(subj, question_text, response) %>%
  spread(question_text, response) %>%
  data.frame() %>%
  remove_rownames() %>%
  column_to_rownames("subj")
  
temp_clust <- hclust(dist(t(temp)))
ggdendrogram(temp_clust, rotate = TRUE)
```

